Integrated assessment system for standards-based assessments

ABSTRACT

A computer integrated assessment system for standards-based assessments wherein the assessments conform to recorded standards, the system includes algorithms for directing the generation of a plan of a series of customized assessments aligned to respective selected standards of the recorded standards wherein each of the customized assessments are set for different times within a period of time; and algorithms for enabling a user to generate a test in one of the customized assessments wherein one of the selected standards of the one customized assessment is displayed to the user during generation of questions for the test. During generation of tests, repeating of questions or questions on selected subject matter can be prevented. Individuals or groups can be selected to the customized assessments. Accuracy in grading of answer sheets by scanning is improved by an algorithm determining the lightest and darkest answer mark to determine the intended answer. Additionally student handheld devices can be used to answer questions in test with automatic grading or scoring.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to U.S. Pat. Nos. 6,322,366, 6,468,085 and 7,065,516 and U.S. patent application Ser. No. 11/009,708, all of which patents and application are expressly incorporated in their entirety herein. This application claims benefit under 35 U.S.C. § 119(e)(1) of U.S. Provisional Patent Applications 60/963,675 and 60/963,676 which are expressly incorporated in their entirety herein. Additionally the disclosure in U.S. Published Patent Application 2003/00044762 is expressly incorporated in its entirety herein.

BACKGROUND OF THE INVENTION

A major challenge facing educational programs in the 21^(st) century is to promote learning aimed at the achievement of valued goals or standards. In an effort to assist educators to meet this challenge, an educational management system has been designed to assist programs to promote goal-directed standards-based learning for example as described in U.S. Pat. No. 6,322,366 and No. 6,468,085. The present patent application details additional innovations that enhance the usefulness of the system for learners involved in a variety of standards-based educational programs. A particularly important group of such learners is comprised of elementary and secondary school students receiving instruction aimed at the achievement of federal, state, and local standards

In one educational management system, instruction to promote goal-directed learning is informed by assessment information indicating the capabilities that a learner has acquired and those that the learner will be ready to learn in the future. Item Response Theory (IRT) is used to estimate the probability that a learner will be ready to acquire capabilities reflecting goals that have not yet been mastered. There is room for innovations that enhance the construction of assessments and the use of assessment information to inform goal-directed standards-based learning.

BRIEF SUMMARY OF THE INVENTION

In the educational management system, instruction to promote goal-directed learning is informed by assessment information indicating the capabilities that a learner has acquired and those that the learner will be ready to learn in the future. Item Response Theory (IRT) is used to estimate the probability that a learner will be ready to acquire capabilities reflecting goals that have not yet been mastered. There are included innovations that enhance the construction of assessments and the use of assessment information to inform goal-directed standards-based learning.

The present invention can be summarized in an integrated assessment system for standards-based education. The system including: (a) Assessment planning within a Benchmark Planner, allowing for a series of customized assessments aligned to standards and delivered on a schedule determined by the user. (b) Automatic benchmark test generation wherein these tests are: (i) Part of an overall benchmark plan that may cover up to an entire academic year. (ii) Able to restrict specific items from being re-tested if included in an earlier benchmark test during that planning period. (iii) Alignment of test items in accordance with the specifications used in the selected benchmark plan. (iv) Capable of optimizing overall printed test length by associating a flexible number of items with a block of text to reduce overall test length and minimizing white space. (c) Benchmark test review wherein the review process contains: (i) Agency review of tests constructed using the benchmark planner. (ii) Multi-phase review, allowing distinct groups to participate in the review process. (iii) Ability to align the items with the associated standard within the review process. (iv) Use of distinct phases of review, e.g. Not Reviewed, Accept, and Replace. (d) Manual construction of tests using Test Builder, including: (i) The ability to hand-enter test questions with text, equations and images, import printable tests, or search a pre-populated test item bank. (ii) Ability to create a standards-based assessment by aligning items with a standard, or to circumvent standard alignment. (iii) Multi-phase construction, e.g. Construction phase, tryout phase, item review, and publication phase. (e) Offline test administration including: (i) The ability to scan answer sheets using optical scanning technology using proprietary answer sheets, or to print and subsequently scan answer sheets on plain paper that have been filled out by the student. (ii) A scanner controlled by a client-side computer connected to the Internet. (iii) Software installed on the client computer, used to connect to the student assessment database to upload assessment item responses. (iv) Bar-coded answer sheets linking the answer sheet to a specific test ID. (v) An algorithm scoring lightest and darkest mark on an answer sheet, to enhance accuracy in determining what constitutes a marked response. (vi) The ability for the client to automatically submit images of scanned answer sheets in the event of a processing problem. (vii) Ability to use both offline and online administration within a single assessment. (f) Online test administration, including: (i) Test entry through an online student center using dual password identification at the student login and test login levels, or through classroom administration using handheld input devices. (ii) Real-time progress updates for the student during test administration. (iii) Ability to administer tests with students using either individual computers or hand-held response pads, obviating the need to have a computer for each student in the testing environment. Test administration using response pads further including: (1) Software installed on a single client computer, connected to a response pad receiver. (2) A central display device, either an instructor's computer or whiteboard, to display test questions, or handheld devices capable of displaying test questions for the student. (3) Handheld response pads using wireless technology to transfer student responses to the client-side application managing the student response pads. (4) Automated transfer of data from the client-side application managing the student response pads to the assessment database. (iv) Test questions and answers transferred to the client computer after each response is saved, reducing the risk of data loss in the event of hardware failure. (g) Test monitoring capability, with the ability for teachers to view individual student item responses, and notation of correct/incorrect responses by question. (g) Combinatorial assessments, where standards mastery may be determined from diverse data sources such as class assignments and online/offline assessments. (h) Combinatorial assessments make it possible to use item parameters along with a continuous ability score to compute the probability that a student will achieve any of a series of goals in a scale comprised of a set of goals.

The invention also may be summarized in an integrative data analysis system to promote standards mastery. The system includes: (a) Innovations making it possible to: (i) Combine multiple tests into a single assessment, (ii) Combine parts of tests to make a new test, and (iii) combine information from tests, class assignments, and other data sources into one scale. (b) The ability to estimate the test score needed to achieve standards mastery and to identify objectives to be mastered to achieve the required test score. (i) Required learning objective estimates are linked to the Benchmark Planner. (c) A Risk Assessment initiative to determine whether or not standards have been achieved. The initiative includes: (i) Predictive abilities to determine which students are on course to meet standards and which students are at risk for not meeting standards. (ii) A model basing estimates of the risk that students will not meet standards on data gathered during a previous year. (iii) Assessment of the validity of estimates based on computation of new estimates when new ata have been collected on all measures involved in the assessment. (d) Use of multiple tests to determine standards mastery. Tests used to determine mastery may include multiple test types (e.g. benchmark, formative, and/or State test).

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1, 4, 7, 9, 11, 13, 15, 17, 20, 23, 28, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 56, 58, 60, 63, 65 and 69 are step diagrams setting forth the operation and construction of the integrated assessment system of the present invention.

FIGS. 2, 3, 5, 6, 8, 10, 12, 14, 16, 18, 19, 21, 22, 24, 25, 26, 27, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 51, 52, 53, 54, 55, 59, 61, 62, 64, 66, 67, 69 and 70 are illustrations of computer screens of the integrated assessment system of the present invention setting forth various steps the above step diagrams.

DETAILED DESCRIPTION OF THE INVENTION

Standards-based education requires that instruction be targeted toward the achievement of shared goals articulated in standards. For example, currently educational agencies across the nation are targeting instruction toward the achievement of shared goals reflected in state and local standards. The effective pursuit of shared goals in a given educational agency (e.g., school district or charter school) calls for a coordinated agency-wide effort designed to insure consistency and continuity in instruction and assessment related to instruction. The desired consistency and continuity are typically reflected in the agency's curriculum and assessment plans, which spell out the goals of instruction and the sequence in which those goals will be pursued and assessed.

An integrated assessment system is developed to support the construction, scheduling, and administration of customized assessments based on input from large numbers of teachers and administrators who are working together toward the achievement of shared goals. The system supports agency-wide assessment planning covering multiple assessments during the school year. It integrates planning with test construction, agency-wide assessment review, and agency-wide and classroom based test scheduling. The system supports agency-wide online and/or offline test administration and automated and/or manual scoring.

Assessment Planning with Benchmark Planner

Two kinds of assessment planning occur in standards-based education. The most familiar involves planning to meet the needs of individual students in the classroom setting. For example, a teacher may construct a brief classroom quiz to measure specific capabilities currently targeted for instruction. Performance on the quiz may be used as a basis for recommending specific instructional activities designed to promote mastery of the targeted skills.

A second form of planning is long-range planning involving multiple assessments designed to assess student performance related to curriculum plans coordinating instruction across multiple grades for an entire school year. Technology supporting this type of planning has been lacking. An innovative tool called Assessment or Benchmark Planner has been developed to support long-range planning involving multiple assessments aligned to standards. The Assessment or Benchmark Planner makes it possible to plan a series of customized assessments aligned to standards and to schedule the delivery of those assessments at successive times across the school year. The planning process begins with the selection of the year during which the plan will be in effect, and the subject and associated standards to which the assessment items called for in the plan will be aligned. The selected standards are often state standards. However, agencies have the option of entering their own local standards using Scale Builder technology. A Plan Transfer feature allows the user to transfer a plan from a previous year to the current year. A Copy feature allows the user to make a copy of a plan. The Transfer and Copy features provide planning continuity across time and across related planning initiatives.

Many people typically have a stake in the development of Benchmark assessments. The Benchmark Planner allows the agency to initiate the process of giving people a voice in the development process by specifying individuals who will have the responsibility of reviewing draft assessments developed in accordance with the plan. This is accomplished using the Set Test Reviewers feature. The completed plan can also be printed and disseminated to interested parties.

For each benchmark test, the user has the option of specifying the number of items to be used to assess each standard (benchmark) to be measured on the test. For example, the user may indicate that four items should be selected to measure each standard. A projected delivery date for the assessment is also automatically recorded for the assessment. The standards potentially available for assessment are displayed in a series of check boxes. The user checks the standards to be assessed on each benchmark. In addition, the user may overwrite the number of items allotted to each standard. For example, suppose that the user has indicated that there should be four items for each standard assessed on the test. The user may overwrite the global specification of four items. For instance, the user might choose to include three items to measure achievement related to a particular standard.

When the benchmark plan for one or more assessments is complete, the user may indicate that the plan is complete and save the plan. The completion of all plans at a given grade level provides a multi-assessment benchmark plan covering an entire school year for that grade level. The completion of plans at multiple grade levels provides a global plan for assessments to be conducted at all of the selected grade levels.

Test Construction Using Generate Benchmark Tests

When a benchmark plan for a given test has been completed, a benchmark test may be automatically generated using the Generate Benchmark Tests feature. This feature is unique in important ways. It treats each benchmark test as part of an overall benchmark plan that generally covers an entire school year. This treatment makes it possible to impose useful restrictions on the entire series of tests associated with the plan. If an item has appeared on a previous benchmark assessment, it can be restricted from appearing on subsequent tests. Likewise, if a previous test includes one or more questions referring to a particular text, the text and any questions referring to it can be excluded from subsequent benchmark assessments. The restrictions help to insure that student performance reflects student skill and not merely responses to specific test items.

The process of generating a test is initiated by selecting the benchmark plan to be used in guiding the construction of the assessment. Next the user selects the item banks to be called upon to generate the test. The third step is to indicate a library which will be used to store and retrieve the test and to specify the subject and grade level for the test. Subject and grade-level information is used to automatically generate a title for the test. The final step is to press the Generate Test button.

When the Generate Test button is pressed, the system selects items aligned with standards in accordance with the specifications in the selected benchmark plan. It keeps items linked to a common text or image together, and it implements an algorithm to order items in ways that save paper required to print test booklets in the event that offline test administration is elected. If items needed to meet some of the requirements of the benchmark plan are unavailable, the system prints a report identifying the number of required items and indicating the standard(s) to which they are to be assigned. These reports can play an important role in guiding item development activities.

Benchmark Test Review

After a series of benchmark tests have been generated in the system, the agency for which the tests were constructed has the option of engaging in test review. Test review serves a special function in standards-based education because benchmark tests are designed to assess many students receiving instruction through the efforts of many educators. All of these educators have a stake in the assessment process. In order to meet the assessment needs of teachers and administrators responsible for student instruction, it is important for the views of all stakeholders to be represented in shaping benchmark assessments.

A two-phase test review process makes it possible for stakeholders to review assessments and achieve needed modifications prior to test publication and administration. The first phase is called initial review. Any number of reviewers can participate in the initial review process. For example, a school district might designate all fifth-grade teachers as initial reviewers of a fifth-grade benchmark math test. By contrast, they might form a review committee to serve as initial reviewers. The second phase is a final review. There is only one final reviewer. The final reviewer has access to all of the initial reviews. It is the final reviewer's responsibility to produce a review that guides construction of the final version of the benchmark test.

Both the initial and final reviews are carried out using the Test Review feature of the system. A review is initiated by selecting the benchmark assessment to be reviewed. The reviewer has the option of displaying certain categories of items. For example, initially a reviewer might wish to see all items. At a later time, a reviewer interested in checking his or her judgments might elect to see only those items that had been previously reviewed.

Each item to be reviewed is displayed along with the standard to which it is aligned. The review status of the item is displayed following the item. There are three status categories, which may be selected by the reviewer: Not Reviewed, Accept, and Replace. Not Reviewed is initially selected for all items that have not yet been reviewed. A comment box is provided below the three status categories.

The reviewer has the option of saving a review at any time. The reviewer also may delete the review. Finally, the reviewer may indicate that the review is complete. When a final reviewer indicates that a review is complete, a message is sent to an assessment staff. After receiving the message, an assessment staff member goes over the final review with the final reviewer. This part of the review process is included to facilitate the construction of reliable and valid benchmark assessments.

The assessment staff member activates a replace button, which appears next to each item for which the review status is Replace. When the button is pressed, the appropriate item bank is searched and possible replacement items are displayed. An item may then be selected from the list of replacement items and inserted into the test. When replacements are completed, the test is ready for publication.

Test Construction Using Test Builder

Assessment options included in the system include the capability to construct assessments using a feature called Test Builder. Test Builder is used mainly at the classroom level to enable teachers to construct class quizzes and formative assessments used to guide instruction related to objectives that are immediate targets for instruction. Test construction is initiated in Test Builder by entering the title of the test and selecting a test library in which to store the test for later retrieval. The next step is to enter general test-taking instructions. There are three basic options for entering items into the test. One option is to import a printable version of the entire test from a word-processing file. This option provides the capability to print test booklets and automatically score scanned answer sheets for assessments developed outside the system and administered offline. The second option is to construct items using text editing, equation editing, and image importation features provided in Test Builder. This option allows users to construct their own items, and to edit, or delete items from the test. The third option is to search recorded item banks. A variety of search options are available. Users may search for items aligned to a particular objective. They may conduct a key word search of objectives, and they may search for groups of items all linked to the same text or image. Search features also include the capability to automatically generate all or part of a test by designating the objectives to be included in the test and the number of items to be included for each of the selected objectives. Items selected from the banks may be copied. The copies may be edited to customize the items to user needs.

In keeping with the standards-based approach to assessment, each item included in an assessment constructed using Test Builder is aligned to a standard. As indicated earlier, all items in The banks are aligned to standards. Thus, items selected from The banks for inclusion on a test constructed using Test Builder are aligned to standards. When a new item is constructed in Test Builder, the user is requested to select the standard to which the item is aligned. Although the system is specifically designed to support standards-based assessment, there are cases in which users may not wish to align items to standards. The system allows for the creation of tests that are not aligned to standards. This is accomplished by creating a dummy objective for each item on the test.

When a new test is developed, the test is automatically assigned a status labeled Construction Phase. When construction is completed, the user may make a series of changes in the status. Following construction, the user may change the test status to Tryout Phase. During the Tryout Phase the test may be scheduled for tryout administration, but scores will not be saved. The user also has the option to change test status to Item Review Phase. In the Item Review Phase the test may be subjected to test review in the manner described for benchmark tests. Finally, the user may change the status to Publication Phase. When test status is changed to Publication Phase, the test can be scheduled and examinee responses will be saved. After a test is published, it cannot be changed or deleted. This rule assures the ability to trace examinee responses back to the test that was actually taken. Although a published test cannot be changed, it can be copied, and the copy can be edited.

Scheduling Tests

After a test has been published, it can be scheduled for online and/or offline administration. Scheduling options are included to accommodate benchmark and classroom formative assessments. Since benchmark tests are typically administered agency wide, an approach that makes it possible to schedule tests for large numbers of schools and classes can save those involved in the scheduling process large amounts of time. A Bulk Scheduler feature that makes it possible to schedule agency-wide assessments quickly and easily. Bulk Scheduler allows the user to schedule a test for all schools or selected schools in an agency. The user has the additional option of scheduling the test for all classes in the set of selected schools or selected classes in those schools. After schools and classes have been chosen, the user specifies the dates within which the test is scheduled for administration. The user may specify a user name and password for students who will be taking the test online. In addition, the user may specify a date for posting assessment results to appropriate audiences (e.g., students and parents).

Although benchmark tests are typically scheduled for large groups of students, other types of assessments such as class quizzes are appropriately scheduled at the class level. Test scheduling at the class level is accomplished using the Class Calendar feature. This feature allows the user to schedule the dates for a class test, a user name and password for the test, and the dates when scores will be posted. As soon as scheduling information has been entered, it appears along with other events on the teacher's Class Calendar.

Offline Test Administration with Scanline

Assessment innovations incorporate a feature called Scanline that includes the ability to scan answer sheets in order to support offline test administration. The ability to scan answer sheets has been well established for many years. Scanline supports established optical scanning technology using proprietary answer sheets. Scanline also supports more recently developed technology making it possible to print and subsequently scan plain paper answer sheets.

The operation of Scanline requires a scanner controlled by a client-side computer connected to the Internet. Scanline software is downloaded from the Internet to the client machine. The software makes it possible to scan proprietary answer sheets and send information regarding examinee responses over the Internet to a server, which automatically scores the responses. In the case of plain paper scanning, the software may send information regarding student responses. However, scanned images may also be sent to a server.

Scanline includes a number of innovations that increase the ease of use of plain-paper scanning technology and that enhance the ability to detect and correct scanning errors using the plain paper approach. Scanline technology identifies form types and form characteristics automatically. This feature enhances ease of use because it supports dynamic form printing and scanning. When a user prints an answer sheet for a selected test, the answer sheet includes a barcode containing the test-administration ID. When the sheet is subsequently scanned, the barcode is read on the client machine. Web services are then called that indicate to the client the number of items on the test and the number of alternatives associated with each item. For example, the barcode might indicate that the test contained 35 items and that items 1 through 12 were true-false items and that the remaining items were 4-alternative multiple choice items. This information would be used to control the information processed by the scanner. Dynamic scanning minimizes the time required to process scanning information. For example, if there are only 35 items on a test, the scanner would process only 35 items even though the form type might be capable of including many more items than 35.

One of the special problems associated with plain-paper scanning involves the determination of what constitutes a marked response. Plain paper scanning necessarily involves printing answer sheets on multiple printers. The output of printers may vary substantially along the light-to-dark dimension. This fact creates circumstances in which an unmarked alternative printed on one printer may be much darker or lighter than the same alternative printed on another printer. One approach for determining whether or not an alternative has been marked is to set darkness threshold expressed in pixels. If the alternative exceeds the threshold, it is classified as marked. Printer variation can make that approach unreliable. In the case in which printer output is dark, unmarked alternatives may exceed the threshold and be incorrectly classified as being marked.

To address the classification problem, an algorithm uses the lightest mark for an alternative as an anchor against which to judge other alternatives. The current implementation of the algorithm recognizes three categories along the light-dark dimension: Light, regular, and dark. When the classification for the lightest bubble has been established, a multiplier is applied to establish the threshold for determining the percentage of pixels required in the annotated space to classify the bubble as marked. A different multiplier is used for each of the three categories in order to insure adequate classification accuracy. The thresholds for the three categories and the multipliers are determined by empirical tests.

When a user experiences a problem during plain-paper scanning, it is helpful for the user to be able to easily and clearly communicate the problem to technical support personnel. An innovation design assists users to address scanning problems. Scanline stores images of scanned answer sheets on the client machine. When a user encounters a problem, the user opens up a scanning history feature. This feature shows each scanned image and the status of the image. For example, if there is a scanning problem, the history indicates that an error has occurred and specifies the nature of the error. The user then has the option of making needed adjustments and rescanning the sheet or submitting the image a server. For example, if the sheet was initially scanned upside down, the user may choose to rescan it. On the other hand, if the nature of the problem is not readily apparent the user may send the image to a technical support specialist for further review. The capability to pinpoint and view problem images of previously scanned sheets and to send them over the Internet with one click increases scanning accuracy and greatly simplifies the task of identifying scanning errors.

Online Assessment

Assessment options include online assessment as well as offline assessment. Moreover, both options are available within a single assessment. Online assessment is carried out in a virtual student center. A dual password approach provides two levels of security for online assessment. Each student is assigned a username and password that allows entry into the student center. A second username and password enables the student to enter the testing environment for the particular assessment that the student is scheduled to take. When the student logs into the test, his or her name appears at the top of the test. This helps proctors to insure that the students scheduled to take the test are actually the individuals who do take the test.

At the beginning of each online assessment, the student is provided with instructions explaining how to navigate through the assessment, how to indicate her/his response, and how to review questions. The online testing feature affords flexible navigation, which enables the student to go to any item at any time. Contextual materials such as narratives or charts appear above the question to which they are attached. If the amount of such material is extensive, it appears in a window that permits the student to use a scroll bar to view all of the contextual content. Students may respond to items in a variety of ways depending on the type of item. For example, to respond to items in a multiple-choice format, the student points and clicks on the “radio button” next to her/his response and then clicks Save My Answer. For these and all other questions, the system will automatically take the student to the next question once the save button is clicked.

As the student proceeds through the test, the numbers for questions already answered are in gray and those not yet answered are in blue. A test completion status bar indicates the proportion of items that have been completed, and a test summary screen lists the questions that have been answered and those that have not been answered. If the student has inadvertently omitted one or more items, the summary can alert the student to the omissions.

Online Assessment with Mercury

Assessment innovations include a feature called Mercury that provides the ability to adminster assessments using hand-held response pad systems. These systems allow students to enter responses to assessment questions on hand-held units. The units then wirelessly transmit the student's responses to a receiver where they are read and recorded by Mercury, on the instructor's computer in a classroom or computer lab environment. Mercury includes features that increase the ease and efficiency with which devices of this type may be used for administration of assessments. These innovations also help to ensure that data is accurately collected, particularly in the event of technical problems such as hardware failure.

Mercury includes an application that is installed on the computer located in the classroom being used to administer the assessments. The application has the ability to communicate directly and automatically with the Galileo database over the Internet via web services located on servers. Because this communication is seamless and automatic it eliminates several of the steps that would otherwise be required for the user. The advantages of this approach are described in detail in the following discussion of the steps require for this type of assessment administration.

In order to administer an assessment, the first step that a teacher must take is to start the Mercury application. Once the program is started, it uses web services to communicate automatically with the Galileo database. The data that is passed to Mercury includes information such as the available students and the scheduled tests. The need for the teacher to take any extra steps to manually download information in order to start administration of a test is eliminated. For example, the teacher is not required to log into the Galileo servers and download the available tests before they can be selected for administration. The list of tests is automatically available.

The next step in administration is for the teacher to distribute the hand held units to the students. Students then start working on the assessment using the units to enter their answers to the questions. Their responses are received by the Mercury program and saved to the Galileo database via web services. This process is entirely automatic without the teacher being required to take any action. There is no need for manual uploading of student responses at the end of test administration. This increases the ease of use for the teacher because there are fewer tasks for them to perform. The accuracy of the resulting data will also be increased because there is no need for the teacher to remember and successfully perform the necessary steps to complete a manual data upload of student responses. Because the responses are being recorded continually, there is also greater protection in the event of hardware failure. For example, should the hard drive on the computer running the Mercury program fail in the middle of test administration, the data loss would be quite limited. Any student responses that had been entered prior to the time of the failure will have already been recorded on the Galileo servers.

Test Monitoring

With both methods of online test administration, test monitoring is possible through the use of Galileo and Mercury. While a group of students is taking a test, teachers may log into the Galileo test administration screen or use links from within the Mercury application to view a monitoring screen showing all students currently taking the test, which questions have been answered, and whether each question was answered correctly or incorrectly.

Integrative Data Analysis System to Promote Standards Mastery

Standards-based assessment initiatives generally include assessment information gathered by multiple agencies. For example, under the No Child Left Behind Act, statewide assessments of standards mastery are required each year at specified grade levels. These tests are often accompanied by local standards-based assessment initiatives such as benchmark assessment programs implemented by local school districts. Although both types of assessment are typically aimed at measuring the mastery of state standards, the data from these assessments are generally not linked in ways that provide flexible data combinations to support accurate mastery classification and provide information that can be used to promote standards mastery. An integrative data analysis system links assessment data from local educational agencies to data from super ordinate agencies such as state departments of education in ways that promote accurate mastery classification and the achievement of shared goals such as those reflected in state standards.

Two kinds of data play a key role in standards-based initiatives: continuous data and categorical data. It is virtually universal practice in standards-based initiatives to provide test scores for an assessment on a continuous distribution. Standard Item Response Theory (IRT) techniques (e.g. Thissen & Wainer, 2001) is used to score tests such as benchmark assessments. In standards-based initiatives, it is also customary to segment the score continuum into categories to determine mastery of standards (e.g. Cizek, 2001). For example, a score continuum might be segmented into categories such as exceeds the standard, meets the standard, approaches the standard, and falls far below the standard. Ability scores such as those yielded using IRT lend themselves to statistical techniques appropriate for use with continuous data. By contrast, mastery classifications call for statistical procedures appropriate for use with categorical data. Integrative data analysis system accommodates scores of both types.

Combinatorial Assessments

Innovative technology involving a continuous score distribution is related to technology described in U.S. Pat. No. 6,322,366 B1 (Nov. 27, 2001) and U.S. Pat. No. 6,468,085 B1 (Oct. 22, 2002) in which a continuous ability score is used along with item parameters to compute the probability that a student will achieve any of a series of goals in a scale comprised of a set of goals. This technology makes it possible to determine standards mastery from diverse data sources. For example, mastery of one standard in a scale might determined by grades on a sample of work such as a class assignment. Another might come from online assessment.

The present application introduces technology innovations based on existing ATI patents. These innovations make it possible to combine multiple tests into a single assessment, to combine parts of tests to make a new test, and to combine information from tests, class assignments, and other data sources into one scale. These combinatorial assessments can be utilized along with other assessment information to guide instruction toward the mastery of standards.

ATI's combinatorial innovations have a number of practical benefits. Combining data from different sources into a single assessment can be expected to increase the reliability of the assessment because test reliability is a direct function of test length (e.g. Nunnally & Bernstein, 1994). The data sources being combined are those directly linked to instruction. For example, class assignments and class quizzes are routine features of instruction. When data gleaned from these instructional mainstays is combined in ways that yield psychometrically sound assessments, it is possible to assess the relationship between those assessments and other high-stakes assessments such as statewide tests used to determine student mastery of standards. The relationship between combinatorial assessments and high-stakes assessments provides a measure of the extent to which performance measured as part of instruction is assessing the same thing as high-stakes measures of student performance used to evaluate schools and students.

In some cases it is useful to combine parts of benchmark tests with short quizzes. For example, a benchmark test may provide information suggesting an intervention targeting certain capabilities. After the intervention, a short formative assessment may be given to determine whether or not the targeted capabilities have been acquired. It may be useful to create and score a new combinatorial test substituting scores on items from the quiz for corresponding scores on items from the benchmark. The new test might then be used to revise estimates of the risk that students may have of not meeting standards as measured on a statewide test.

Ability scores computed from combinatorial assessments can play an important role in guiding instruction. Using the continuous IRT ability score and item parameters to estimate the probability that a student will be able to perform tasks reflecting the mastery of particular standards. This information can be used to determine what capabilities to target for instruction.

Many variations, modifications and changes may be made in the above described example without departing from the scope and spirit of the invention. 

1. A computer integrated assessment system for standards-based assessments wherein the assessments conform to recorded standards, the system comprising: means for directing the generation of a plan of a series of customized assessments aligned to respective selected standards of the recorded standards wherein each of the customized assessments are set for different times within a period of time; and means for enabling a user to generate a test in one of the customized assessments wherein one of the selected standards of the one customized assessment is displayed to the user during generation of questions for the test.
 2. A computer integrated assessment system as defined in claim 1 further comprising: means for preventing repeating a question in a subsequent assessments.
 3. A computer integrated assessment system as defined in claim 1 further comprising: means for allowing review of tests for each customized assessment by respective selected participants.
 4. A computer integrated assessment system as defined in claim 1 further including the ability to scan test answer sheets wherein the lightest and darkest answer marks are determined and used to determine the marked answer for question.
 5. A computer integrated assessment system as defined in claim 1 further including means to administer tests using student handheld input devices. 